# Multimodal Inference
Qwen2.5 VL 32B Instruct FP8 Dynamic
Apache-2.0
An FP8 quantized version based on the Qwen2.5-VL-32B-Instruct model, supporting visual-text input and text output, suitable for efficient inference scenarios.
Image-to-Text
Transformers English

Q
BCCard
140
1
Gemma 3 27b It FP8 Dynamic
Apache-2.0
This is a quantized version of google/gemma-3-27b-it. The weights are quantized using the FP8 data type. It is suitable for visual-text input and text output, and can perform inference with efficient deployment using vLLM.
Image-to-Text
Transformers English

G
RedHatAI
1,608
1
Mistral Small 3.1 24B Instruct 2503 Quantized.w4a16
Apache-2.0
This is an INT4-quantized Mistral-Small-3.1-24B-Instruct-2503 model, optimized and released by Red Hat (Neural Magic), suitable for fast-response dialogue agents and low-latency inference scenarios.
Text-to-Image
Safetensors Supports Multiple Languages
M
RedHatAI
219
1
Qwen2.5 VL 7B Instruct FP8 Dynamic
Apache-2.0
The FP8 quantized version of Qwen2.5-VL-7B-Instruct, supporting efficient vision-text inference through vLLM
Text-to-Image
Transformers English

Q
RedHatAI
25.18k
1
Qwen2.5 VL 3B Instruct FP8 Dynamic
Apache-2.0
The FP8 quantized version of Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, and optimizing inference efficiency.
Text-to-Image
Transformers English

Q
RedHatAI
112
1
Pixtral 12b FP8 Dynamic
Apache-2.0
pixtral-12b-FP8-dynamic is a quantized version of mistral-community/pixtral-12b. By quantizing weights and activations to the FP8 data type, it reduces disk size and GPU memory requirements by approximately 50%. It is suitable for commercial and research purposes in multiple languages.
Text-to-Image
Safetensors Supports Multiple Languages
P
RedHatAI
87.31k
9
Featured Recommended AI Models